Goto

Collaborating Authors

 target measure


On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

arXiv.org Machine Learning

Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hรถlder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.


Constrained Density Estimation via Optimal Transport

arXiv.org Machine Learning

The classical optimal transport (OT) problem seeks the map that moves mass from a source to a target measure while minimizing a prescribed cost function. The objective can be formalized in either Monge's [12] or Kantronich's formulation [10], a convex relaxation of the former that considers transport plans instead of deterministic maps. These foundational formulations have wide-ranging applications, including to economics [7] and machine learning [14]. In many practical scenarios, the source measure is known or readily in-ferrable from empirical data but the target measure is not explicitly specified. Instead, it is only constrained by practical requirements or expert knowledge. For example, when applying Monge's formulation to transportation problems, the placement of the mass in the target region may be constrained to lie entirely beyond a certain boundary or within a particular region, rather than by the specification of a precise location for each fraction of the total mass. Similarly, in economic applications, supply and demand may be subject to constraints such as maximal amounts available or minimal amounts required, rather than dictated through precise marginal distributions. 1


Evolution of Gaussians in the Hellinger-Kantorovich-Boltzmann gradient flow

arXiv.org Machine Learning

This study leverages the basic insight that the gradient-flow equation associated with the relative Boltzmann entropy, in relation to a Gaussian reference measure within the Hellinger-Kantorovich (HK) geometry, preserves the class of Gaussian measures. This invariance serves as the foundation for constructing a reduced gradient structure on the parameter space characterizing Gaussian densities. We derive explicit ordinary differential equations that govern the evolution of mean, covariance, and mass under the HK-Boltzmann gradient flow. The reduced structure retains the additive form of the HK metric, facilitating a comprehensive analysis of the dynamics involved. We explore the geodesic convexity of the reduced system, revealing that global convexity is confined to the pure transport scenario, while a variant of sublevel semi-convexity is observed in the general case. Furthermore, we demonstrate exponential convergence to equilibrium through Polyak-Lojasiewicz-type inequalities, applicable both globally and on sublevel sets. By monitoring the evolution of covariance eigenvalues, we refine the decay rates associated with convergence. Additionally, we extend our analysis to non-Gaussian targets exhibiting strong log-lambda-concavity, corroborating our theoretical results with numerical experiments that encompass a Gaussian-target gradient flow and a Bayesian logistic regression application.


Numerical and statistical analysis of NeuralODE with Runge-Kutta time integration

arXiv.org Artificial Intelligence

NeuralODE is one example for generative machine learning based on the push forward of a simple source measure with a bijective mapping, which in the case of NeuralODE is given by the flow of a ordinary differential equation. Using Liouville's formula, the log-density of the push forward measure is easy to compute and thus NeuralODE can be trained based on the maximum Likelihood method such that the Kulback-Leibler divergence between the push forward through the flow map and the target measure generating the data becomes small. In this work, we give a detailed account on the consistency of Maximum Likelihood based empirical risk minimization for a generic class of target measures. In contrast to prior work, we do not only consider the statistical learning theory, but also give a detailed numerical analysis of the NeuralODE algorithm based on the 2nd order Runge-Kutta (RK) time integration. Using the universal approximation theory for deep ReQU networks, the stability and convergence rated for the RK scheme as well as metric entropy and concentration inequalities, we are able to prove that NeuralODE is a probably approximately correct (PAC) learning algorithm.


Sampling through Algorithmic Diffusion in non-convex Perceptron problems

arXiv.org Artificial Intelligence

We analyze the problem of sampling from the solution space of simple yet non-convex neural network models by employing a denoising diffusion process known as Algorithmic Stochastic Localization, where the score function is provided by Approximate Message Passing. We introduce a formalism based on the replica method to characterize the process in the infinite-size limit in terms of a few order parameters, and, in particular, we provide criteria for the feasibility of sampling. We show that, in the case of the spherical perceptron problem with negative stability, approximate uniform sampling is achievable across the entire replica symmetric region of the phase diagram. In contrast, for the binary perceptron, uniform sampling via diffusion invariably fails due to the overlap gap property exhibited by the typical set of solutions. We discuss the first steps in defining alternative measures that can be efficiently sampled.


Review for NeurIPS paper: Deep Diffusion-Invariant Wasserstein Distributional Classification

Neural Information Processing Systems

Strengths: * The paper addresses the iinteresting concept of classification problem where inputs and targets are both viewed as represented by prpbability measures. The setting is novel and the proposed derivations are theoretically and empirically supported. The proposed architecture comprises two architectures: a measure to measure mapping network f which realizes a push-forward operation and a prediction network g which inputs the measure issued from f and allows to predict the final label. An explicit formulation of the diffusion operator amenable to computation in deep learning setting is described. Also a theoretical justification of the exponential decay of the diffusion-invariance term is provided.


Measure-to-measure interpolation using Transformers

arXiv.org Machine Learning

Transformers are deep neural network architectures that underpin the recent successes of large language models. Unlike more classical architectures that can be viewed as point-to-point maps, a Transformer acts as a measure-to-measure map implemented as specific interacting particle system on the unit sphere: the input is the empirical measure of tokens in a prompt and its evolution is governed by the continuity equation. In fact, Transformers are not limited to empirical measures and can in principle process any input measure. As the nature of data processed by Transformers is expanding rapidly, it is important to investigate their expressive power as maps from an arbitrary measure to another arbitrary measure. To that end, we provide an explicit choice of parameters that allows a single Transformer to match $N$ arbitrary input measures to $N$ arbitrary target measures, under the minimal assumption that every pair of input-target measures can be matched by some transport map.


Optimal Control of Agent-Based Dynamics under Deep Galerkin Feedback Laws

arXiv.org Artificial Intelligence

Ever since the concepts of dynamic programming were introduced, one of the most difficult challenges has been to adequately address high-dimensional control problems. With growing dimensionality, the utilisation of Deep Neural Networks promises to circumvent the issue of an otherwise exponentially increasing complexity. The paper specifically investigates the sampling issues the Deep Galerkin Method is subjected to. It proposes a drift relaxation-based sampling approach to alleviate the symptoms of high-variance policy approximations. This is validated on mean-field control problems; namely, the variations of the opinion dynamics presented by the Sznajd and the Hegselmann-Krause model. The resulting policies induce a significant cost reduction over manually optimised control functions and show improvements on the Linear-Quadratic Regulator problem over the Deep FBSDE approach.


Approximation properties of slice-matching operators

arXiv.org Machine Learning

Iterative slice-matching procedures are efficient schemes for transferring a source measure to a target measure, especially in high dimensions. These schemes have been successfully used in applications such as color transfer and shape retrieval, and are guaranteed to converge under regularity assumptions. In this paper, we explore approximation properties related to a single step of such iterative schemes by examining an associated slice-matching operator, depending on a source measure, a target measure, and slicing directions. In particular, we demonstrate an invariance property with respect to the source measure, an equivariance property with respect to the target measure, and Lipschitz continuity concerning the slicing directions. We furthermore establish error bounds corresponding to approximating the target measure by one step of the slice-matching scheme and characterize situations in which the slice-matching operator recovers the optimal transport map between two measures. We also investigate connections to affine registration problems with respect to (sliced) Wasserstein distances. These connections can be also be viewed as extensions to the invariance and equivariance properties of the slice-matching operator and illustrate the extent to which slice-matching schemes incorporate affine effects.


Sampling with Mollified Interaction Energy Descent

arXiv.org Artificial Intelligence

Sampling from a target measure whose density is only known up to a normalization constant is a fundamental problem in computational statistics and machine learning. In this paper, we present a new optimization-based method for sampling called mollified interaction energy descent (MIED). MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs). These energies rely on mollifier functions--smooth approximations of the Dirac delta originated from PDE theory. We show that as the mollifier approaches the Dirac delta, the MIE converges to the chi-square divergence with respect to the target measure and the minimizers of MIE converge to the target measure. Optimizing this energy with proper discretization yields a practical firstorder particle-based algorithm for sampling in both unconstrained and constrained domains. We show experimentally that for unconstrained sampling problems, our algorithm performs on par with existing particle-based algorithms like SVGD, while for constrained sampling problems our method readily incorporates constrained optimization techniques to handle more flexible constraints with strong performance compared to alternatives. Sampling from an unnormalized probability density is a ubiquitous task in statistics, mathematical physics, and machine learning. While Markov chain Monte Carlo (MCMC) methods (Brooks et al., 2011) provide a way to obtain unbiased samples at the price of potentially long mixing times, variational inference (VI) methods (Blei et al., 2017) approximate the target measure with simpler (e.g., parametric) distributions at a lower computational cost. In this work, we focus on a particular class of VI methods that approximate the target measure using a collection of interacting particles.